{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<h1> Vaccine Design </h1>\n",
    "\n",
    "\n",
    "This tutorial illustrates the use of Fred2 for population optimized epitope-based vaccines using OptiTope <a href=http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1000246>(Toussaint, et al. (2008) PLoS Comput Biol, 4(12))</a>\n",
    "\n",
    "For this tutorial to work, you have to install a MIP solver that is supported by Pyomo like CPLEX, GLPK, or CBC. \n",
    "\n",
    "This tutorial will entail:\n",
    "- Simple epitope prediction from a list of peptide sequences and protein sequences\n",
    "- Simple epitope selection\n",
    "- Activation/deactivation of optional constraints\n",
    "- String-of-Beads design\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "%matplotlib inline\n",
    "%reload_ext autoreload\n",
    "%autoreload 2\n",
    "\n",
    "from Fred2.Core import Allele, Peptide, Protein,generate_peptides_from_proteins\n",
    "from Fred2.IO import read_lines, read_fasta\n",
    "from Fred2.EpitopePrediction import EpitopePredictorFactory\n",
    "from Fred2.EpitopeSelection import OptiTope\n",
    "from Fred2.EpitopeAssembly import EpitopeAssembly, EpitopeAssemblyWithSpacer"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<h2> Chapter 1: Simple Epitope Prediction and preparation </h2>\n",
    "    <br/>\n",
    "Epitope selection is the most important step for vaccine design. It is concerned with selecting a small set of candidate epitopes to maximize the probability of inducing a long lasting and strong immune response. Epitope Selection is an interface to OptiTope a highly flexible mathematical framework for epitope selection. OptiTope determines the provably optimal epitope set that maximizes the overall immunogenicity for a target population or a single person and user specified requirements.\n",
    " \n",
    "\n",
    "We first start with reading in proteins from a FASTA file and constructing peptides from these. Then we have to construct a list of alleles to represent our target population and assign their probability of occurring in that targeted population.\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "proteins = read_fasta(\"./data/proteins.fasta\", id_position=3, in_type=Protein)\n",
    "peptides = generate_peptides_from_proteins(proteins,9)\n",
    "\n",
    "alleles = []\n",
    "with open(\"./data/allele_probabilities_europe.csv\", \"r\") as f:\n",
    "    for l in f:\n",
    "        allele,prob = l.split()\n",
    "        alleles.append(Allele(allele,prob=float(prob.strip())))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "With that in place, we can start predicting the immunogenicity of each peptide HLA allele pair. Since the prediction of immunogenicity is an unsolved problem as of yet, the IC50 value of an epitope HLA allele pair is usually used as approximation. In this tutorial we use `BIMAS` as prediction engine. In additional, we can specify an allele specific binding threshold that is used internally to filter for specific constraints.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "bimas = EpitopePredictorFactory(\"BIMAS\")\n",
    "supported_alleles = filter(lambda a: a.name in bimas.supportedAlleles, alleles)\n",
    "epi_df =bimas.predict(peptides, alleles=supported_alleles)\n",
    "epi_df.head()\n",
    "threshold = {a.name:1 for a in supported_alleles}\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<h2> Chapter 2: Epitope Selection </h2>\n",
    "\n",
    "OptiTope defines the overall immunogenicity $I(E)$ of an epitope set $E$ as the sum over the immunogenicity of its components weighted by the HLA allele $a \\in A$ frequencies $p_a$ of the target population $A$. This is a commonly made assumption due to a lack of understanding the interplay of different epitopes within a vaccine.\n",
    "\n",
    "\\begin{align}\n",
    "I(E) = \\sum_{e \\in E}\\sum_{a \\in A} p_a \\cdot i_{e,a}\n",
    "\\end{align}\n",
    "\n",
    "Together with the prediction results, we are ready to use OptiTope. The simplest model OptiTope offers, is the optimal selections only constraint by the number of allowed selected epitopes:\n",
    "\n",
    "\\begin{align}\n",
    "\\max &\\sum_{e \\in E}x_e\\sum_{a \\in A}p_a\\cdot i_{e,a}\\\\\n",
    "\\text{s.t.}\\\\\n",
    "&\\sum_{e \\in E} x_e \\le k\n",
    "\\end{align}\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[PEPTIDE:\n",
       "  AVDPADRCK\n",
       " in PROTEIN: Protein_0, PEPTIDE:\n",
       "  VLDKTKFLV\n",
       " in PROTEIN: Protein_1, PEPTIDE:\n",
       "  DPADRCKEV\n",
       " in PROTEIN: Protein_0]"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "optitope = OptiTope(epi_df, k=3, solver=\"cplex\", threshold=threshold)\n",
    "selected_epitope = optitope.solve()\n",
    "selected_epitope"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can also activate additional constraints. For example we can restrict the selected epitope set to cover a certain percentage of all HLA alleles:\n",
    "\n",
    "\\begin{align}\n",
    "\\max &\\sum_{e \\in E}x_e\\sum_{a \\in A}p_a\\cdot i_{e,a}\\\\\n",
    "\\text{s.t.}\\\\\n",
    "&\\sum_{e \\in E}x_e \\le k\\\\\n",
    "&\\sum_{e \\in I_a} x_e \\ge y_a~ ~ \\forall a \\in A \\\\\n",
    "&\\sum_{a \\in A} y_a \\ge \\tau^{\\text{HLA}}\n",
    "\\end{align}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[PEPTIDE:\n",
       "  LPVLDKTKF\n",
       " in PROTEIN: Protein_1, PEPTIDE:\n",
       "  AVDPADRCK\n",
       " in PROTEIN: Protein_0, PEPTIDE:\n",
       "  DPADRCKEV\n",
       " in PROTEIN: Protein_0]"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "optitope.activate_allele_coverage_const(0.8)\n",
    "optitope.solve()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "and/or antigens:\n",
    "\n",
    "\\begin{align}\n",
    "\\max& \\sum_{e \\in E}x_e\\sum_{a \\in A}p_a\\cdot i_{e,a}\\\\\n",
    "\\text{s.t.}\\\\\n",
    "&\\sum_{e \\in E}x_e \\le k\\\\\n",
    "&\\sum_{e \\in I_a} x_e \\ge y_a~ ~ \\forall a \\in A \\\\\n",
    "&\\sum_{a \\in A} y_a \\ge \\tau^{\\text{HLA}}\\\\\n",
    "&\\sum_{e \\in I_g} x_e \\ge z_g~ ~ \\forall g \\in G \\\\\n",
    "&\\sum_{g \\in G} z_g \\ge \\tau^{\\text{Antigen}}\n",
    "\\end{align}\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[PEPTIDE:\n",
       "  LPVLDKTKF\n",
       " in PROTEIN: Protein_1, PEPTIDE:\n",
       "  AVDPADRCK\n",
       " in PROTEIN: Protein_0, PEPTIDE:\n",
       "  DPADRCKEV\n",
       " in PROTEIN: Protein_0]"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "optitope.activate_allele_coverage_const(0.8)\n",
    "optitope.activate_antigen_coverage_const(0.8)\n",
    "optitope.solve()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "you can also again deactivate these optional constraints with:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[PEPTIDE:\n",
       "  AVDPADRCK\n",
       " in PROTEIN: Protein_0, PEPTIDE:\n",
       "  VLDKTKFLV\n",
       " in PROTEIN: Protein_1, PEPTIDE:\n",
       "  DPADRCKEV\n",
       " in PROTEIN: Protein_0]"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "optitope.deactivate_allele_coverage_const()\n",
    "optitope.deactivate_antigen_coverage_const()\n",
    "optitope.solve()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "OptiTope  offers other optional constraints like epitope coverage or overlapping constraints. Also, the allele probability could be reused to represent for example the HLA expression of a individual, or can be completely ignored (which results in OptiTope to treat each allele uniformly for each locus)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<h2> Chapter 3: Epitope Assembly </h2><br/>\n",
    "Epitope Assembly is concerned with ordering epitopes into a string-of-beads poly-peptide which maximizes the probability that the epitopes will be fully recovered after proteasomal cleavage. This is an important step for vaccine design and has potentially high impact on the efficacy of the designed vaccine. Epitope Assembly formulates the epitope ordering problem as a traveling salesperson problem where epitopes represent the cities to visit and the distances between the cities represent the recovery probabilities.\n",
    "\n",
    "Fred2 offers two alternatives for string-of-beads design. A simple string-of-beads with `EpitopeAssembly.EpitopeAssembly`, and a string-of-beads design with additional spacer sequences connecting the epitopes (`EpitopeAssembly.EpitopeAssembly`).\n",
    "\n",
    "Since these problems are formulated as Traveling salesperson optimization problem, the time to solve an instance can vary dramatically and solving to optimality can be impractical even for small instances. Therefore, Fred2 offers the possibility to approximate a string-of-beads configuration with the Lin-Kernighan traveling salesman heuristic besides an optimal solution. For that, the LKH implementation must be downloaded, compiled, and globally executable. The source code can be found at <a href=http://www.akira.ruc.dk/~keld/research/LKH/>http://www.akira.ruc.dk/~keld/research/LKH/</a>.\n",
    "\n",
    "<h4> Simple String-of-Beads Design </h4><br/>\n",
    "Lets start with the simple approach. For cleavage prediction we use `PCM` but any other cleavage site prediction method available in Fred2 can be used. To access the cleavage site prediction methods we use the `Fred2.CleavagePrediction.CleavageSitePredictorFactory`, similar to the `EpitopePredictorFactory`. All we have to specify is a set of say through OptiTope selected epitopes and a cleavage site predictor. Fred2 will return a list of Peptides based on their optimal ordering.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "SVB:  DPADRCKEV-VLDKTKFLV-AVDPADRCK\n"
     ]
    }
   ],
   "source": [
    "from Fred2.CleavagePrediction import CleavageSitePredictorFactory\n",
    "\n",
    "pcm = CleavageSitePredictorFactory(\"PCM\")\n",
    "SVB = EpitopeAssembly(selected_epitope,pcm,solver=\"cplex\").solve()\n",
    "print \"SVB: \",  \"-\".join(map(str,SVB))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<h4>String-of-Beads Design with Spacer Sequences</h4><br/>\n",
    "Designing spacer sequences within a string-of-beads is more demanding and the model used is restricted to linear prediction function (currently only models derived from `APSSMEpitopePrediction`). \n",
    "\n",
    "Spacer Design for Epitope Assembly is concerned with determining the optimal amino acid composition and length of a small sequence connecting two epitopes within a string-of-beads poly-peptide and the epitope order to maximizes the probability that the epitopes will be fully recovered after proteasomal cleavage with the smallest neo-immunogenicity.\n",
    "Hence, we have to specify a linear epitope predictor besides the cleavage site predictor and a HLA allele set for which the spacer sequences are optimized. Here again, Fred2 offers the possibility to approximate the solution via LKH.\n",
    "\n",
    "We will use PCM as cleavage site and again BIMAS as epitope binding prediction methods and solve the design problem optimally.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "SVB with spacer:  DPADRCKEV-HHH-VLDKTKFLV-HH-AVDPADRCK\n"
     ]
    }
   ],
   "source": [
    "SVB_spacer = EpitopeAssemblyWithSpacer(selected_epitope, pcm, bimas, \n",
    "                                       alleles, threshold=threshold, solver=\"cplex\").solve()\n",
    "print \"SVB with spacer: \", \"-\".join(map(str,SVB_spacer))"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 2",
   "language": "python",
   "name": "python2"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}
